System and method for human gesture processing from video input

ABSTRACT

A system and method for an associative interaction framework used for user input within an interaction platform used in an environment that includes collecting image data in the environment; through computer vision analysis of the image data, classifying objects in the environment wherein a plurality of the objects are detected users; for at least one user, detecting an associative interaction event, which includes: through computer vision analysis of the image data, detecting a first object association of the one user with a first object, and initiating an associative interaction event with a set of interaction properties including properties of the user and the first object association; and executing an action response based on the associative interaction event.

CROSS-REFERENCE TO RELATED APPLICATIONS

This Application claims the benefit of U.S. Provisional Application No.62/558,731, filed on 14 Sep. 2017, which is incorporated in its entiretyby this reference.

TECHNICAL FIELD

This invention relates generally to the field of human computerinteraction, and more specifically to a new and useful system and methodfor vision-based associative interaction framework for human-computerinteraction.

BACKGROUND

Computing devices, in particular personal computing devices, are anintegral part of modern life. A recent trend in computing has been theemergence of ambient computing, which does not depend on physicalinteraction by a user with a device. Such devices are still limited intheir adoption and are mostly limited to personal or home computingdevices. There are numerous forms of user interfaces for when a userdirectly interacts with a known computing device. Ambient computing iscurrently still largely limited to a user directing voice or explicitgestures to a personal sensing device. The user typically is aware ofthe presence of this sensing device and eplicityly directs input towardsit. In the field of voice-based user interfaces, a user speaks to aknown listening device. In the field of computer-vision, user interfaceshave been created that rely on deliberate gestures expressed by a userto a known camera. However, there are not pre-established intuitive userinterfaces for interacting with ambient computing devices, and inparticular an interaction framework does not exist for general videosurveillance of an environment. For example, there is no existingsolution for us in a commercial setting serving tens to hundreds ofusers simultaneously. Thus, there is a need in the human computerinteraction field to create a new and useful system and method forvision-based associative interaction framework for human-computerinteraction. This invention provides such a new and useful system andmethod.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a schematic representation of a system of a preferredembodiment;

FIGS. 2A-2D are exemplary associative interactions with one or moreobjects;

FIGS. 3A-3D are exemplary associative interactions involving acontextual object;

FIGS. 4A-4C are exemplary associative interactions involving an activeobject;

FIG. 5 is an exemplary associative interaction with a gesture modifier;

FIG. 6 is a flowchart representation of a method of a preferredembodiment;

FIG. 7 is a schematic representation of an exemplary implementation usedto trigger a product comparison;

FIG. 8 is a schematic representation of an exemplary implementationusing a personal device to supply user input and to receive interactionfeedback;

FIG. 9 is a schematic representation of an exemplary implementation ofreceiving data from an active device as part of defining the associativeinteraction event;

FIG. 10 is a schematic representation of an exemplary implementation ofincorporating voice-based user input into associative interactionevents;

FIG. 11 is a schematic representation of an exemplary implementationusing a context-loaded object to direct the automatic digital order of aproduct; and

FIG. 12 is a schematic representation of an exemplary implementation ofcommunicating with a user device.

DESCRIPTION OF THE EMBODIMENTS

The following description of the embodiments of the invention is notintended to limit the invention to these embodiments but rather toenable a person skilled in the art to make and use this invention.

1. Overview

A system and method for a vision-based associative interaction frameworkfor human-computer interaction of a preferred embodiment functions toenable CV-driven applications to leverage associative interactions withobjects as a form of user input for a digital experience as shown inFIG. 1.

The associative interaction framework enabled through the system andmethod is based on the CV-driven detection of a user establishing anobject interaction with one or more objects. Object interactions can beapplied hand contact, a hand gesture applied to an object, digitalinteraction with an object that is a digital device, and/or othersuitable forms of interaction. The properties of the object interaction,the type or identity of the objects, the identity or properties of theuser, and/or other factors can then be variables in selecting andtriggering an appropriate interaction event (i.e., an action).

As an illustrative example, the simultaneous contact with two objectsusing the two hands of a user can establish an associative interactioninvolving that user and the two objects. An action is then selected andtriggered in response interaction properties like the type of objects,the identity and state of the user, and/or other conditions of theassociative interaction. In the exemplary use case of a store, acustomer (with or without an accompanying personal computing device likea phone) may be able to perform personalized user input simply beinteracting with shelved products in different manners such asinitiating an audio message played over a local speaker system near thecustomer.

The system and method preferably use CV-driven image analysis and thecoordination of multiple devices in enabling a framework of receivinguser input based around object associations and appropriately executinga response at one or more device integrated in the associated system. Inan associative interaction framework, interactions can be derived fromdetectable interactions between two or more objects. Preferably, one ofthose objects is a user. Another preferred type of object is an objectthat is or previously was a static object present in the environment,which becomes the subject of an associative interaction based on theexplicit actions of a user.

Natural object interactions of a user (like touching, grabbing, holding,pointing, standing in a location, and the like) can be used incontrolling a rich set of user inputs simply by changing the number andtype of objects interacted with by a user and the current context of theinteraction. In a store setting, a user can be provided with a full setof user input functionality in any region monitored by a CV monitoringsystem and with an object for interaction.

In the case of a user-object interaction, a user can manipulate objectswith one or both hands to establish an association with an object—thatinteraction with an object will trigger an interaction. Non-contactinteractions may additionally or alternatively be supported. Forexample, pointing at an object, gesturing toward an object, directingattention to an object (e.g., directing gaze towards an object ordirecting front of body toward object), or other non-contactinteractions may also be detected and considered as establishing anassociation with an object. The user could additionally or alternativelyuse other body parts such as their feet or body. Additionally, the usermay be able to control interactions by touching and/or otherwiseinteracting with different objects in different combinations. Thecombination of object interactions forms different “user-objecttopologies” (i.e., interaction topologies) that are used in specifyingthe object associations.

One exemplary interaction can be a user touching or holding an item.This may be used in triggering the presentation of item-relatedinformation through a display or audio output. This may alternativelytrigger transparent events like adding a touched product to a list ofrecently viewed items. Another exemplary interaction can be a usertouching or holding two items. In a store environment, an associativeinteraction by a user with two products may be used to trigger a productcomparison of the two held items.

As used herein, an associative interaction framework is the generaldescriptor of the form of gesture interface enabled by the system andmethod. Within the associative interaction framework, associativeinteractions are forms of “input”. An associative interaction frameworkenables a rich set of computer interactions that can be based ondetectable associative interactions. The interaction framework mayinclude one or more types of associative interactions. The Interactionframework is generally described as being used by a user. Theinteraction framework can enable the detection of multiple interactionspossibly performed by multiple users. Additionally, the associativeinteraction framework may have different sets of associativeinteractions for different contexts. For example, some sets ofassociative interactions may only be enabled for a subset of users(e.g., users enabling advanced type of interactions or users enablingselect accessibility-related interactions).

As used herein, interaction topology is the general descriptor for the“network” established during a given associative interaction. A basicinteraction topology is the connection of at least one object and auser, but more complex interaction topologies can include multipleobjects and even multiple users or agents.

As used herein, an object interaction is used to characterize aCV-detectable event involving at least two objects that signifies anestablished association. A preferred form of object interaction is handcontact by a user with an object. Other forms of object interaction mayalso including detecting a user pointing at an object, detecting a userstanding in an interaction region of an object (e.g., satisfying anobject-proximity condition).

As used herein, an object association characterizes the result of adetectable object interaction that associates two or more objects.Multiple object interactions can establish multiple object associations.Normally, simultaneous object associations are evaluated in determiningsome interaction response/action. In some variations, objectassociations may be persistent for some period of time, wherein multipleobject associations evaluated as a single associative interaction can beestablished through distinct non-concurrent object interactions. Forexample, comparing two products may be a command issued by a usertouching a first object and then touching a second object within aspecified time window (e.g., in under three seconds).

As used herein, object type is used to characterize the variousproperties of an object. The object type and the combination of objecttypes involved in an object association will generally be used inselecting an appropriate interaction response/action. Some primaryvariations of object types can include passive objects, context-loadedobjects, and active objects.

A passive object preferably characterizes an object that primarilyservices to modify the associative interaction by its visible identity.A passive object will generally not exhibit digital communication withthe system (at least in the context of detecting associativeinteractions). Passive objects can be objects like a product.

Context-loaded object may be some object assigned significance. Forexample, a sticker placed on the shelf to indicate “favorite” action canenable a user to tap that sticker while holding an item to favorite it.A context-loaded object is usually used to set the type of actionapplied.

Active objects are generally computing devices integrated into thesystem that can be responsive to some action or can provide somesupplementary form of input. An active object will generally becommunicatively coupled with the system in some manner such thatinternal digital state can be communicated from the object to the systemand/or data can be communicated to the active object. A tablet can be anactive object where it will display different content based on whatobject is held by the user when the tablet is touched.

As used herein, a user is used as a descriptor for a CV-detectableperson that is usually the agent responsible for establishing aninteraction topology. Other types of agents may additionally bedetectable alongside or in place of the user. The associativeinteraction framework is preferably operable across multiple users in anenvironment simultaneously.

The system and method are particularly applicable when implemented in anenvironment with an imaging system with distributed imaging devices.Since the environment will likely have multiple users observed atvarious instances, this variation may enable multiple users to deliverindependent input through an environment-installed system to facilitatepersonalized interactions with a connected computing system.Additionally, the associative framework relies on more readilydetectable “gestures” that can be detected through overheadvideo/image-based monitoring without the user explicitly “performing”the interaction for singular target. The associative interactionframework is preferably robust as a form of user interaction when theuser is facing any direction and in any location in the monitoredenvironment. Similarly, the system and method promotes overt actionsreadily detected and distinguished but interpreted to specify specifictargeted interactions with a system.

As previously indicated, the system and method may be particularlyapplicable in use cases that involve object interactions. One exemplaryuse case can include building user experiences for consumers and workersin a commerce or retail setting where interactions with differentproducts and objects in the environment trigger different events. In acommercial space, people browsing an aisle have their attention directedat where they are moving or towards the product of interest. In thisenvironment, there is a plurality of objects substantially static and auser can selectively initiate an associative interaction with an object,which makes such an interaction framework particularly useful. Thesystem and method may alternatively be used in any situation where anagent will be interacting with other objects or agents.

In particular, the system and method may be applicable as an interactionframework for CV-driven commerce. One such forms of CV-driven commerceis automatic or accelerated checkout. Automatic checkout may rely ondetection and accounting of items selected by a user. The system andmethod can provide a complimentary interaction framework such that auser can perform a wider range of interactions beyond just adding orremoving items from a virtual cart. The system and method may be used toperform actions such as generating an in-store browsing history,triggering a price check for an item, comparing prices of items, gettingnutritional guidance, adding an item to a wishlist, enabling a user toselect different information options by modifying an interaction with agesture or by interacting with a context-loaded object, and/orperforming any suitable action. The interaction framework mayalternatively be used in any suitable environment such as themedical/hospital space, industrial/manufacturing space, food services,construction, office work, education, home and personal user, and/or anysuitable application.

An environment as used herein characterizes the site where a CVmonitoring system is installed and operational. The system and methodcan be made to work for a wide variety of environments. In a preferredimplementation, the environment is a shopping environment such as agrocery store, convenience store, micro-commerce & unstaffed store,bulk-item store, pharmacy, bookstore, warehouse, mall, market, and/orany suitable environment that promotes commerce or exchange of goods orservices. An environment is generally the inside of a building but mayadditionally or alternatively include outdoor space and/or multiplelocations. In alternate use cases, the environment can include ahousehold, an office setting, a school, an airport, a public/city space,and/or any suitable location. The environment can be a locally containedenvironment but may alternatively be a distributed system with widecoverage.

As one potential benefit, the system and method can enable aninteraction framework for predominantly CV-driven applications. Inenvironment-installed CV-driven applications there may be no known“sensor target” where a user can direct a gesture for user input. Inother words, the users performing the interaction may not be directinginteractions in the direction of a targeted sensing device. Accordingly,a CV-driven application operating over an imaging system installed in anenvironment may benefit from more intuitive interaction framework thatleverages intuitive interactions and contextual interactions. In placesof commerce, the system and method can provide a rich set of userinteractions that are detectable and naturally performable by users.

As another potential benefit, the system and method can enable intuitiveinteractions. The system and method work to transform picking up items,pointing at items, and similar actions into system inputs. In onepreferred implementation, the gesture interactions are natural enoughthat interaction events are triggered during normal user actions, butare also explicit enough that a user can discover and explicitly invokesuch interaction events with some measure of control. Additionally,since the user is not directing the interactions towards a specificdevice for the intention of detecting the interaction, a user isrelieved of actively.

Similarly, the interaction framework can be based around more naturalinteractions. However, variations of the system and method mayappropriately gate such object interactions by explicit actions of theuser so that unintentional triggering of an associative interaction isprevented or minimized.

Similarly, another potential benefit of the system and method is tocustomize interactions according to individual user preference. As arelated potential benefit, the system and method may enable temporarilyor permanently disabling interactions according to individual userpreference. In one implementation, associative interactions may beincrementally introduced to a user during natural performance by a user,which functions to naturally introduce or expose functionality. In somevariations, the user may select a preference to enable or disable theassociative interactions in the future. For example, the first time auser picks up two products, product comparison feedback in response tothis detected associative interaction can be introduced and optionally,the user can enable or disable this and other related features.

As another potential benefit, the system and method may enablecontextual interactions. The interactions and the resulting interactionevents can be customized to the involved objects, which can open up awide range of types of control input. The framework can be highlyextensible to enable a variety of ways that interactions may be modifiedor expressed. This framework can be customized and applied to differentsituations and use cases. Additionally, a user can exert active controlover setting of context to direct the extraction of information and/orcommunicating some input.

As another potential benefit of some implementations, the system andmethod may enable passive and low infrastructure options for augmentinginteractions. As the system and method operate over object interactions,all objects can become computer inputs to the CV-driven application. Ina large environment, support infrastructure like computing kiosks can becost prohibitive, and as such frequently not installed in manylocations. The system and method can enable each object to betransformed to a mechanism that facilitates interaction input. In someimplementations context-loaded elements can be used within anenvironment to expose contextual interaction controls with minimalinfrastructure investment. For example, printed labels can be positionedthroughout the store and act as a context-loaded object to specify someassociative interaction property.

As another potential benefit, the system and method can facilitateoptimization of gesture-based associative interaction monitoring acrossa plurality of objects. For example, the system and method can supportselective monitoring for associative interactions according to varioustrigger conditions. This can function to enable better computationalefficiency and performance by not monitoring for interactions at alltimes for all users. For example, an associative interaction can dependon some level of proximity to an object. Accordingly, the system maytrigger associative interaction monitoring only when a user is in alocation that satisfies one or more object proximity conditions.

2. Associative Interaction Framework

The associative interaction framework operates based on the detection ofan associative interaction through a CV monitoring system and thentriggering an interaction event based on the properties of theassociative interaction at one or more connected computing system. Theassociative interaction framework may have a number of dimensions foraltering the interpretation of an interaction. Dimensions of theframework characterize possible factors that can be varied to alter theintent of an interaction. Two main dimensions of associativeinteractions include interaction topology and involved object types.Other optional dimensions can include user context, interactionmodifiers, and interaction triggers. These different dimensions may beused to alter the interaction event and/or setting the interactionproperties.

An interaction topology is the set of objects associated with someinteraction and the configuration of how those objects are associated.An interaction topology preferably involves at least two objects thatare associated with each other based on interaction. Associations arepreferably established through contact, proximity, the holding anobject, a particular interaction like pointing, or other detectableinteractions between two or more objects. In one preferred type ofassociative interaction, the user acts as the associative bridge as anintermediary object establishing object associations with multipleobjects. In other cases, non-human objects can act as the associativebridge. For example, a special mat, a basket, or shelf may be anassociative bridge for objects placed in or on it. Placing items in aparticular region may be used to convey some input to the system.

The interaction topology provides a flexible detectable construct onwhich various forms of interactions can be built. Different use casesmay leverage different forms of interaction topologies. Two primaryforms of interaction topologies can include single object associationsand multi-object associations.

As shown in FIG. 2A, a single object association can be established asan association between a user and one object. An object association canbe established by various visually identifiable events. Preferably,visually identifiable events can include a user touching the object, auser grabbing or picking up an object, a user pointing at an object, auser and an object satisfying a proximity condition (e.g., having adisplacement under a set threshold), a user adding an object to anexisting user-associated object such as a cart or bag, and/or anysuitable type of interaction. With a user, associations may be based onhand interactions, finger interactions, foot/leg interactions, headinteractions, body interactions, and/or any suitable interaction.Different objects may have different interaction events that are used toestablish an association. For example, a small product (e.g., a pillow)in a store may be added to the interaction topology in response to adetecting a user picking up and holding the item still for longer thanhalf a second, but a large product in the store may be added to theinteraction topology in response to detecting a user pointing at theproduct.

As shown in FIGS. 2B-2D, a multi-object association can be establishedas an association of a user with two or more objects. For example, auser interacting with two objects simultaneously (e.g., holding twoobjects—one object in each hand) can establish a multi-objectassociation. Multi-object associations are preferably establishedthrough multiple single object associations. In one preferred variation,a multi-object association is established through a user holding ortouching two objects. In another preferred variation, a multi-objectassociation may be established by a user standing in a designated area(e.g., a marked location on ground) and holding or touching an object atthe same time as shown in FIG. 2D. More than two single-objectassociations could also be established as also shown in FIG. 2D.

A multi-object association may be established with various sequencingrules or conditions. For example, one variation may have a temporalproximity condition where multiple object associations are recognizedwhen made within a time window, otherwise only the initially establishedobject association is considered. Alternatively, multi-objectassociations may be made dynamically whenever two or more objectassociations can be detected. For example, a user may grab a firstobject and then sequentially touch a second object, a third object, andthen a fourth object, which could trigger detecting three multi-objectassociative interactions (e.g., a first and second object associativeinteraction, a first and third object associative interaction, a firstand fourth object associative interaction).

In one optional operating mode, multi-object associations may be made inan accessibility mode that functions to enable multiple single objectassociations to be built up to specify a multi-object association. Inone implementation, a multi-object association may be created by a userestablishing multiple single object associations within a time windowbut without the single object associations necessarily overlapping intime. For example, a user with mobility challenges may enable anaccessibility mode that is specifically applied to monitoring ofassociative interactions for that user. That user to compare twoproducts may pick up one item and then within a long enough time periodpick up a second item to trigger a multi-object associative interactionevent.

As mentioned, a user or an alternative type of agent may be an elementof the associative interaction. Herein, a user is used as the exemplaryagent but an alternative agent such as an animal, robot, automobile, orother suitable type of agent could similarly act as the agent. There canbe single agent and multi-agent variations of associative interactions.As an example of a multi-agent variation, a first and second user canmake contact establishing an association between the two users. In thisway a chain of object associations may be made. In another example,there may be use cases where any user in a set region should beconsidered linked, such that they can cooperatively triggerinteractions.

The interaction object type of an object is another dimension of anassociative interaction that functions to characterize theclassification of the object. Different types of objects may triggerand/or alter different forms of interactions. The objects are preferablyclassified such that the classification and/or identity will determinethe interaction. Object type can include general classification of anobject, CV-derived object classification, a unique identifier for anobject, linked properties of the object (e.g., product properties linkedto the product identifier) and/or other forms of type classifications.Two general object types that can result in different forms ofassociative interactions include passive objects and active objects.

Passive objects are preferably objects with some detectableclassification and/or identity and that have no integration with theCV-based system. The classification/identity of a passive object is theinput from that object used in at least partially defining anassociative interaction. For example, a product in a store can be apassive object—a user touching a can of soup will result in the productID for that can of soup being used in triggering an associativeinteraction with the product ID as an interaction input.

Passive objects could additionally be made to convey enhanced meaning.Context-loaded objects may serve to convey some special context to aninteraction. A context-loaded object can be classified or identified aswith other passive objects, but the object may be assigned set meaningwithin the interaction framework. For example, graphical stickers withdifferent meaning could be placed on a shelf. A user can make contactwith a product and different stickers to trigger different actions likerequest price information, request nutrition information, or save forlater as shown in FIG. 3A. Context-loaded stickers such as this couldaffordably be distributed in a store, but enable rich and intuitiveinteractions by a user.

A context-loaded object may additionally have multiple dimensions ofinteraction and/or some mechanism for modifying the context. As anexample of a context-loaded object with two dimensions of interaction,the object may have some exposed surface and depending on the point ofinteraction by a user (e.g., where a user touches in 2D space),different information is conveyed by interacting with that object. As anexample of a modifier, a switch or dial that is part of a particularcontext-loaded object could be set to different positions and used toconvey different information when involved in an associativeinteraction. The switch or dial could be passive in the sense that it isnot digitally communicating its state to a connected system. The stateof such a mechanical modifier is preferably visually detected by the CVmonitoring system.

Active objects are preferably computer-based objects integrated into thesystem that can act as a digital input or output in addition to aCV-detectable object. An active input can preferably provide data to thesystem and method through a secondary channel (i.e., a non-visualchannel). An active object can include a computing device such as apersonal computing device, a digital kiosk, an autonomous device (e.g.,robot, vehicle, etc.), or other suitable device. Examples of personalcomputing devices can include a personal phone, a smart watch, smartglasses, connected headphones, and/or any suitable type of personalcomputing device.

In many cases, the active object is capable of digital input and output,but some varieties of active objects may only be capable of either inputor output.

As an input, the state of an active object may modify an interaction inresponse to input delivered to the active object. An example of anactive input object could be a keyboard where a user can enterinformation that is digitally communicated to the system and method. Asanother example, a touch display of an active object can be used tonavigate a connected app's menu and select some interaction option(e.g., type of information to present) and then a user could touch aproduct as the shelf to serve as a second object in a multi-objectassociative interaction. The app running on the active object preferablycommunicates with the CV monitoring system and/or a connected system soas to relay the state information (e.g., the selected interactionoption). The displayed option can modify the interpretation of theinteraction as shown in FIG. 4A. A smart watch may serve as aparticularly useful active object as a user could readily set thedigital state of an application and then interact with an object.

As an output, the state of the active object can be changed as a resultof an associative interaction. An example of an active output objectcould be a digital speaker or display that outputs information as aresult of an associative interaction.

User context can provide another optional dimension to an associativeinteraction. A user acting as an agent in an associative interaction canhave settings or configuration associated with his or her identity thatmay alter the interpretation or resulting action of an interaction. Usercontext is preferably applied by identifying a user through biometricidentification, account registration/identification, devicesynchronization, and/or other approaches. In one variation, a user mayself-identify using a check-in kiosk upon entering the environment. Inanother variation, the CV-monitoring system can detect the identity ofthe user. The identity of a user may also be identified based on theuser account logged into an application—this account-registered app canbe matched to a user observed in the CV-monitoring system.

Upon identifying the user, the settings, configuration, session state,and/or other forms of account associated data can be accessed and usedin altering the associative interaction. For example, a user may createa shopping list in an app. That shopping list may be used in alteringthe associative interaction when a user picks up an item from that list.Similarly, a user (or any type of agent) may have various forms of stateinformation that can be used in altering interpretation or changing theresulting action of an interaction. For example, a user tracked withinan environment may have CV-modeled state information (e.g., detectedcart contents) that may be used independent of any account informationin altering an associative interaction.

As another exemplary form of user context, a user may set preferencesfor how they want interaction events to be handled. A user may specifyin an app or through an account management system their preference andconfiguration for actions responding to an associative interactionevent. In one implementation, a user can specify their product-values,which may relate to their nutritional/dietary restrictions orpreferences, moral guidelines (e.g., purchasing only local items), orother preferences. These may be used to appropriately present the rightinformation for each user. In one variation, the user could configureuser-devices to receive feedback on. For example, they could specifythey want audio alerts to play through a connected headphone device orfor notifications to be sent to a phone.

An associative interaction may additionally be partially determinedthrough some alternative modifier. State or gestures of a user may beused in modifying an interaction. One form of controllable state can besimultaneously detected gesture input. In one variation, during a singleobject association a user may apply a hand gesture modifier using thefree hand as shown in FIG. 5. The hand gesture in combination with theother factors of the associative interaction can be used in specifyingthe result of the interaction. In one implementation, the number offingers held out during an interaction may be used to issue a command.For example, a user can touch an object and then hold out one finger tosave for later, two fingers to add to a delivery order, or three fingersto request nutritional information.

Similarly, various forms of controllable state of an object couldsimilarly be used to modify an associative interaction. In onevariation, the orientation of an object could be modified to signaldifferent forms of input. For example, a user may tilt an object indifferent orientations and/or hold at different positions to signaldifferent forms of input thereby triggering different interactions. Inanother variation, some objects different natural states like a book(e.g., opened and closed), which depending on the state or change ofstate of the object during the object interaction can modify theassociative interaction.

Because the interactions are based on natural actions of a user, theremay be particular user input triggers that are used to regulate when toexecute or act on a current interaction. Interaction triggers may beanother factor of an associative interaction and are preferably used toprovide a control mechanism in signaling when an associative interactioncan be executed.

An interaction trigger could be an interaction modifier. For example, auser may need to say some keyword while performing an interaction toexecute some action. In another variation, a user may have to perform anexplicit hand gesture with one hand while establishing an associationwith another object. The input provided may additionally modify theinteraction as discussed above.

Another interaction trigger could be time. For example, an associativeinteraction may be triggered only when the associative topology and/orthe state of the associative interaction is held for some period oftime.

As another variation, an interaction trigger could depend on detectingholding an object in a predefined orientation range. For example, anassociative interaction may be triggered when an associated object isheld flat with the “face” up. For example, to request a price check, auser may have to hold a box of cereal substantially flat and face up.

Another optional dimension of associative interactions can be thesequencing of detecting an associative interaction. An associativeinteraction may additionally have different stages of interactions suchas association-start, association-sustained, association-change (e.g.,if one or more object is changed), and association-end. These may eachtrigger different events in the system. For example, the system could beconfigured to trigger various callbacks or event messages for eachrespective stage of an associative interaction. Similarly, the timing ofthese events can be used to convey some meaning. For example, eventsrelying on modifiers may only be triggered on association-start orassociation-end.

The framework for associative interactions may be similarly extended toinclude other dimensions or factors in interpreting CV-detected input.

3. System for an Associative Interaction Framework

As shown in FIG. 1, a system 100 for an associative interactionframework can include a CV monitoring system 110 coupled to aninteraction platform 120. Variations of the system may additionallyinclude an account system 122, active interaction devices 130, and/orcontext-loaded elements 140. The system 100 functions to apply theassociative interaction framework described above in executinginteraction events within the interaction platform 120. The system 100may be used in any suitable application. As described above, the system100 can be particularly applicable to CV-driven applications within openenvironments where a plurality of agents may interact with aninteraction platform 120 through a shared imaging system. Accordingly,the system 100 can independently and simultaneously detect associativeinteraction input of distinct agents in a monitored environment. Inparticular, the system may be applicable to enabling interactions forCV-driven commerce (e.g., automatic self checkout).

In one preferred variation of the system 100, in an interactionframework used for user input within an interaction platform used in anenvironment, the CV monitoring system no includes a plurality of imagingdevices distributed at distinct locations across an environment. Atleast a subset of imaging devices is preferably mounted in an aeriallocation. The CV monitoring system 110 is additionally preferablyconfigured to: collect image data, classify objects in the environment,and detect object associations. Classifying object in the environmentpreferably includes detection of a set of users that are present in theenvironment. Detection of object associations is preferably performedfor at least one user and more preferably performed independently acrossa subset of users or all users. Associative interaction events mayinvolve one or more object associations. In a two-object association,the CV monitoring system 110 preferably detects a first objectassociation between the user and a first classified object, detects asecond object association between the user and a second classifiedobject, and initiates an associative interaction event with propertiesof the first classified object, the second classified object, and theuser. The interaction platform 120 of the system 100 is preferablyconfigured to initiate execution of a response to the associativeinteraction events with at least one computing device. This computingdevice may be an internal system of the interaction platform, a remotecloud hosted computing system, a site-installed computing device, a userdevice, or any suitable computing device.

A CV monitoring system no functions as a CV-driven imaging system toprocess and generate conclusions from one or more sources of image data.The CV system can provide: person detection; person identification;person tracking; object detection; object classification; objecttracking; extraction of information from device interface sources;gesture, event, or interaction detection; and/or any suitable form ofinformation collection using computer vision and optionally otherprocessing techniques. The CV monitoring system no is preferably used todrive CV-based applications of the interaction platform 120. In the caseof CV-driven commerce, the CV monitoring system 110 may facilitatedgeneration of a checkout list (i.e., a virtual cart) during shopping,tracking inventory state, tracking user interactions with objects,controlling devices in coordination with CV-derived observations, and/orother interactions. The CV monitoring system 110 will preferably includevarious computing elements used in processing image data collected by animaging system. In particular, the CV monitoring system 110 isconfigured for detection of agents (e.g., users) and established objectassociations as described above.

The CV monitoring system 110 can preferably track user activity formultiple users simultaneously, such that the system may supportmanagement of multiple virtual carts simultaneously.

The CV monitoring system 110 preferably operates in connection to animaging system 112 installed in the environment. The imaging system 112functions to collect image data within the environment. The imagingsystem 112 preferably includes a set of image capture devices. Theimaging system 112 might collect some combination of visual, infrared,depth-based, lidar, radar, sonar, and/or other types of image data. Theimaging system 112 is preferably positioned at a range of distinctvantage points. The imaging system 112 preferably forms substantiallyubiquitous monitoring within the environment as described below.However, in one variation, the imaging system 112 may include only asingle image capture device.

The image data is preferably video but can additionally or alternativelybe a set of periodic static images. In one implementation, the imagingsystem 112 may collect image data from existing surveillance or videosystems. The image capture devices may be permanently situated in fixedlocations. Alternatively, some or all may be moved, panned, zoomed, orcarried throughout the facility in order to acquire more variedperspective views.

In one variation, a subset of imaging devices can be mobile cameras(e.g., wearable cameras or cameras of personal computing devices). Forexample, in one implementation, the system 100 could operate partiallyor entirely using personal imaging devices worn by agents in theenvironment. The image data collected by the agents and potentiallyother imaging devices in the environment can be used for collectingvarious interaction data.

In a preferred implementation, at least a subset of the image capturedevices are oriented for over-head monitoring, wherein the image capturedevices collect a substantially aerial perspective. In a shoppingenvironment, the imaging system 112 preferably includes a set ofstatically positioned image devices mounted with an aerial view from theceiling. The aerial view imaging devices preferably provide image dataacross stored products monitored for virtual cart functionality. Theimage system is preferably installed such that the image data covers thearea of interest within the environment (e.g., product shelves). In onevariation, imaging devices may be specifically setup for monitoringparticular items or item display areas from a particular perspective.

Herein, ubiquitous monitoring (or more specifically ubiquitous videomonitoring) characterizes pervasive sensor monitoring across regions ofinterest in an environment. Ubiquitous monitoring will generally have alarge coverage area that is preferably substantially continuous thoughdiscontinuities of a region may be supported. Additionally, monitoringmay monitor with a substantially uniform data resolution.

Large coverage, in one example, can be characterized as having greaterthan 95% of surface area of interest monitored. In a shoppingenvironment, this can mean the shelves and product displays as well asthe shopping floor are monitored. Substantial uniform data resolutionpreferably describes a sensing configuration where the variability ofimage resolution and/or coverage of different areas in the environmentare within a target range. In the exemplary case of automatic checkoutCV-driven applications, the target range for image resolution issufficient to resolve product-packaging details for productidentification.

Ubiquitous monitoring may optionally include the characteristic ofredundant monitoring. This may involve having redundant coverage frommultiple vantage points. For example, an item on a shelf may be visibleby two different cameras with adequate product identification resolutionand where the cameras view the item from different perspectives. In anenvironment like a grocery store this could mean 10-200 camerasdistributed per an aisle in some exemplary implementations.

Similarly, the system 100 may additionally include other computer inputor output devices across an environment. The system 100 and method canbe used in the collection of sensor data and/or generation of an outputin addition to or as an alternative to video and/or image data. Otherforms of devices such as microphones, Bluetooth beacons, speakers,projectors, and other suitable devices could additionally oralternatively be integrated into system modules that may be installedacross an environment. Herein, the system and method are primarilydescribed as it relates to image-based video monitoring.

The CV monitoring system 110 is preferably used in the detection ofassociative interactions, but the CV monitoring system 110 willgenerally be simultaneously be used in executing other CV-basedfunctionality. For example, in a store environment (e.g., a grocerystore), the CV monitoring system 110 can be configured to additionallytrack a checkout list for automatic checkout and/or expedited checkoutat a checkout station. In one variation, the CV monitoring system 110may be used to generate a virtual cart, which may be performed in amanner substantially similar to the system and method described in USPatent Application publication No. 2017/0323376, filed 9 May 2017, whichis hereby incorporated in its entirety by this reference. In othersettings like an industrial, office, or hospital setting, the CVmonitoring system 110 may be used to monitor worker actions andoperations. In environments like a gym or other areas it may trackactivity. Herein, the use case of tracking item selection forfacilitating checkout is used as a primary example, but the system 100is not limited to such uses. The CV monitoring system 110 may be usedfor any suitable additional functionality alongside associativeinteraction monitoring.

The CV monitoring system 110 can include a CV-based processing engineand data management infrastructure. The CV-based processing engine anddata management infrastructure preferably manages the collected imagedata and facilitates processing of the image data to establish variousmodeling and conclusions relating to interactions of interest. Forexample, the selection of an item and the returning of an item are orparticular interest. The data processing engine preferably includes anumber of general processor units (CPUs), graphical processing units(GPUs), microprocessors, custom processors, and/or other computingcomponents. The computing components of the processing engine can residelocal to the imaging system 112 and the environment. The computingresources of the data processing engine may alternatively operateremotely in part or whole.

The CV monitoring system may additionally or alternatively includehuman-in-the-loop (HL) monitoring which functions to use humaninterpretation and processing of at least a portion of collected sensordata. Preferably, HL monitoring uses one or more workers to facilitatereview and processing of collected image data. The image data could bepartially processed and selectively presented to human processors forefficient processing and tracking/generation of a virtual cart for usersin the environment.

The system 100 may additionally include additional sensing systems suchas a user location tracking system. Location tracking can use Bluetoothbeaconing, acoustic positioning, RF or ultrasound based positioning,GPS, and/or other suitable techniques for determining location within anenvironment. Location can additionally or alternatively be sensed ortracked through the CV monitoring system 110. The CV monitoring system110 can include a user-tracking engine that is configured to track userlocation. Preferably, the user location can be used to generatecontextual data of user location relative to the environment. This maybe used to detect items in proximity to a user. Nearby items can be setas a set of candidate items, which may be used to bias or prioritizeidentification of an item during management of the virtual cart.

In one implementation the image capture devices can be distributed ascamera modules. The camera modules may be multifunctional and caninclude other supplementary components used in offering additional orenhanced sensing or functionality. Supplementary components may includemicrophones, speakers, area lighting, projectors, communication modules,positioning system modules, and/or other suitable components.Alternatively, the supplemental sensors and computing components may beintegrated into the system 100 separately or in any suitable manner. Inone variation, the camera module and/or the system 100 can includemicrophones such that a distributed audio sensing array can be created.Audio sensing can be used in identifying, locating, and collecting audioinput from different locations. For example, the system with microphonescan triangulate sounds to determine location within the environment.This can be used to facilitate CV-based tracking. This couldalternatively be used in enabling audio-based interactions with thesystem 100. In one variation, the microphone array provided through themonitoring network may be used to facilitate multi-user audio-interfaceswithin an environment (e.g., an in-store customer audio-interface. Forexample, a user could issue audio commands from any place in the store,this could be synchronized with the CV-driven application which may beused to associate a detected audio command with a user entity or accountissuing that command. In one implementation, the microphone array may beused in differentially locating, processing, modifying, and respondingto audio sources as discussed in published U.S. patent application Ser.No. 17/717,753 filed 27 Sep. 2017, which is hereby incorporated in itsentirety by this reference.

In another variation, the camera module and/or the system 100 caninclude integrated speakers, which can function to enable audio output.In one implementation, this may be used to simply play audio across anenvironment. The speakers are preferably individually controllable, andtargeted audio could be played at different regions. This can be used indelivering audio feedback to a user based on the associativeinteractions of that particular user where the feedback is played on anenvironment-installed speaker that's near the user.

An interaction platform 120 functions to be a computing environment thatcan be responsive to detected associative interactions. The interactionplatform 120 preferably manages the interactions as well as possiblyorchestrating other functionality. An interaction platform 120 ispreferably a remote computing environment where account-basedinteractions can be executed. The interaction platform 120 couldadditionally or alternatively computing resource(s) that are locallyhosted at or near the environment. The interaction platform 120 mayinclude an account system 122. The interaction platform 120 mayalternatively be one or more computing devices.

In one use-case, the interaction platform 120 is configured tofacilitate automatic self-checkout, facilitated checkout (using aCV-based detected checkout list) and/or in-store commerce interactions.The associative interactions may be used within the interaction platform120 to facilitate interactions such as creating a physical in-storebrowsing history, performing price check, comparing prices between twoobjects, requesting nutritional guidance, adding an item to a wishlist,triggering an in-store promotion, augmenting in-store deviceinteractions, and/or other suitable forms of interaction.

An account system 122 can include account-level configuration and dataassociations. An account system 122 may be used to store user stateinformation, user preferences, user lists or digital collections, userplatform history (e.g., purchase history and the like), and/or anysuitable information. The account system may additionally include arecord of user devices that may be usable as inputs or outputs for theassociative interaction events.

An active interaction device 130 functions to bridge digitalinteractions with objects included in the interaction topology. Anactive interaction device 130 is preferably a computing device that canact as an active object as described above. Multiple types of activeinteraction devices 130 may be used within an environment. An activeinteraction device 130 can either provide an additional source of inputin augmenting the associative interaction and/or act as an outputcontrolled in response the associative interaction. An activeinteraction device 130 may be an environment-installed device. Forexample, an informational computer kiosk or a checkout kiosk may bedistributed within a store environment. In this variation, multipleusers may be expected to interact with the device. Accordingly,user-to-device object associations may be established as a user uses thedevice. Biometric or other form of user identification may be used indetermining who is interacting with the device. Alternatively, the CVmonitoring system 110 could facilitate detecting who is using thedevice.

Active interaction devices 130 may alternatively be user-controlleddevices such as a user phone, a smart watch, smart glasses (e.g.,augmented glasses), connected headphones, other wearable computingdevice, and/or any suitable computing device. Commonly, theuser-controlled device will include an installed application that canrun in the foreground and/or background to facilitate managing state,receiving user input, collecting sensor data, and/or controlling userinterface output (e.g., visual, audio, tactile feedback). A personalactive interaction device 130 may be explicitly observed during theassociative interaction. For example, a user may be holding their phonewith an app active. Alternatively, the personal active interactiondevice 130 may be detected or previously detected but hidden or obscuredduring an associative interaction. For example, a user may set someoption impacting interaction events on their phone, but then have theirphone in their pocket when establishing an associative interaction witha product. The app/phone's state may still be considered as anassociative interaction property and used to modify the interactionevent.

Personal computing devices may additionally be used as an output(independent of defining an instance of an associative interaction). Forexample, the display or audio output of a personal computing device maybe updated and controlled in response to associative interactionsinvolving device owner. In some instances, phones, smart watches, smartglasses (e.g., glasses or other head worn devices that may havecamera/imaging system, AR/VR display, microphone, speakers, etc.),connected head phones, or other personal devices may communicativelyconnect to the interaction platform 120 while the user is in theenvironment so that various forms of output could be delivered to thatdevice in response to associative interaction events. For example, inresponse to an associative interaction mapped to some form ofinformation delivery, the interaction platform 120 may be configured tosend an instruction to a computing device of the involved user to playaudio that relays that information. Similarly, a display may beinstructed to present the information.

The system 100 may additionally include context-loaded elements 140 thatfunction to act as context-loaded objects within the associativeinteraction framework. Context-loaded elements 140 can be any suitabletype of object. One preferred implementation may use graphical markersas context-loaded elements 140. Graphical markers can be stickers,signage, marketing materials (e.g., product packaging), or other formsof graphical markers. As one exemplary use of context-loaded elements140, the system 100 may include stickers with distinct graphical regionsthat can be touched during an associative interaction to triggerdifferent interactions. In another example, a marker on the ground maybe used as a context-loaded object, where a particular type ofassociative interaction is triggered when a user steps on the marker.

A context-loaded element 140 is preferably an object that is configuredto be visually detectable and identifiable (e.g., identifying type or aunique identifier). The context-loaded element 140 may additionallyinclude two or more sub-regions that are distinct so as to signaldistinct modifications to an associative interaction. As anothervariation, the context-loaded element 140 may have an interaction regionwhere interactions can be interpreted along a graduated scale. Acontext-loaded element could include a region along a path where touchcontact can signal some scale metric (e.g., a value varying from 0 to10). This may be used, for example, to set the volume of an audio systemthat acts as active interaction device 130 when the user establishesobject associations with the context-loaded element (e.g., touching thescale) and the audio system (e.g., pointing at the audio system device).

The system 100 and its components preferably includes machine-readableconfiguration to execute or perform the operations described herein.

4. Method for an Associative Interaction Framework

As shown in FIG. 6, a method for an associative interaction framework ofa preferred embodiment may include collecting image data Silo;classifying objects in the environment S120; detecting associativeinteraction events of the objects S130; executing an action responsebased on the associative interaction event S140. The method functions tofacilitate detection and execution of an associative interactionframework.

As primarily described herein, the method is described as being usedacross a plurality of users in an environment. For example, the methodcould be used for an interaction framework to collect user input in astore environment. Using an environment CV monitoring system, which isused in monitoring multiple users, the method is preferably beingexecuted in parallel across multiple users within an interactionplatform. Furthermore, some variations of the method can operate acrossusers in an environment independent of the user having previously beenenrolled or possessing configured device.

The method may alternatively be used for more confined environments andoptionally be limited to monitoring one or a limited set of users. Themethod herein is primarily described as it can be used in commerce-basedenvironments, but as described above it can be applied in any suitableenvironment. The method is preferably implemented by a system asdescribed above, which is configured for facilitating an associativeinteraction framework. Similarly, the method is preferably used toimplement and facilitate the associative interaction framework and itsvarious potential implementations as described herein.

Block S110, which includes collecting image data, functions to read oraccess video and/or image data from the environment. This may includecollecting image data from plurality of imaging devices that aredistributed across an environment. In one preferred implementation, thecollection of image data is achieved through a CV monitoring systemconfigured for ubiquitous monitoring as described above. Each imagingdevice preferably collects a stream of image data, which may be analyzedindividually or in coordination with one or more additional streams ofimage data. In some environments such as stores and/or commercialspaces, a subset of the image data maybe collected from aerialperspective. The imaging devices preferably include suspended mountingfixtures such that they can be secured to the ceiling, shelving,pillars, or other structures such that the imaging devices capture usersfrom above. In many instances, the imaging devices are configured to bepositioned at least eight feet above the floor. Collecting image datamay alternatively include collecting image data from a single camera. Inone alternative implementation, smart glasses with a camera mayimplement a version of the method for detecting associative interactionsusing a single imaging device of the smart glasses.

One potential benefit of the associative interaction framework is thatit can be operable through “gestures” or interactions that do not needto be directed towards an imaging device. Multiple imaging devices maybe used in combination in providing image data used in detecting asingle associative interaction. Multiple independent associativeinteractions can preferably be detected within the image data indifferent locations and at different times. Accordingly, classifying ofobjects S120 and detection of object associations in block S130 may beachieved through two or more streams of image data. For example, blockS130 may include detecting a first object association through a firststream of image data collected from a first camera and detecting asecond object association through a second stream of image datacollected from a second camera.

Block S120, which includes classifying objects in the environment,functions to detect a label or identifier for objects in the image data.Classifying objects can include classifying passive objects,context-loaded objects, active objects, and/or other suitable types ofobjects. Additionally, classifying objects may additionally includedetecting and/or tracking users or agents. Detection and/or tracking ofusers may be implemented through the same or different process ofclassifying other types of objects. In an implementation used forcommerce-based use cases, individual product items and users can be twotypes of classified objects. Product items can be classified usingcomputer vision based machine learning and algorithmic approaches torecognizing an item by its packaging, shape, and/or other properties.Classifying a product can generally map the product back to a productidentifier (e.g., a SKU identifier), which may have a variety ofproperties associated with it.

In the case of a food-related product, properties of the product caninclude information such as a product name, a quantity metric, price,price per unit, nutritional information, ingredient list, certifications(e.g., Organic, non-GMO, gluten-free, sustainably sourced, etc.) and/orother attributes. Some or all properties may be accessed in determininginteraction events or used as part of a resulting action. For example,an interaction event may result in information relating to one or moreproperty being communicated to the associated user.

Various CV-based object classification techniques may be employed inobject detection and classification such as a “bag of features”approach, convolutional neural networks (CNN), statistical machinelearning, or other suitable approaches. Neural networks or CNNs such asFast regional-CNN (r-CNN), Faster R-CNN, Mask R-CNN, and/or other neuralnetwork variations and implementations can be executed as computervision driven object classification processes. Image feature extractionand classification is an additional or alternative approach, which mayuse processes like visual words, constellation of featureclassification, and bag-of-words classification processes. These andother classification techniques can include use of scale-invariantfeature transform (SIFT), speeded up robust features (SURF), variousfeature extraction techniques, cascade classifiers, Naive-Bayes, supportvector machines, and/or other suitable techniques. Object classificationand detection models can be trained on particular types of deviceinterface sources.

As part of or in addition to classifying objects, the method can includedetecting users that are present in the environment. In some variations,the method may more specifically include tracking a plurality of users.In one variation, the method may simply detect users without performingpersistent tracking of the user through the environment. Trackingpreferably identifies a user and then monitors their path through theenvironment. Detecting and/or tracking of users are preferably performedthrough computer vision analysis of the image data.

Users can be uniquely identified. In this variation, the user objectsmay be associated with an account or user identity record. User objectscan be associated with an account or user identity through biometricidentification, user-initiated identification, device identification, orother suitable forms of user identification.

Alternatively, users may not be uniquely identified and simply detectedas a user. For example, without performing unique identification, themethod could enable a customer in a store to be detected interactingwith a product and then a store-installed speaker or screen coulddisplay information relevant to that product.

Detection and/or tracking of a user may additionally use other forms ofsensing such as Bluetooth beaconing, synchronizing with personalcomputing devices, position tracking systems, and/or other suitablesystems.

User objects may additionally be biomechanically modeled such that thebody parts of the user can be monitored for object interactions.Accordingly, the body, arms, hands, legs, feet, head, and/or other bodyparts may be modeled.

Detecting and/or tracking a user can be used to enable or disablemonitoring for an associative interaction. Detecting a user as anon-interactive user can disable evaluation for associativeinteractions. In one variation, the tracking of a user as it relates tolocation, orientation, and movement may be used to temporarily orpermanently disable associative interaction detection for thatparticular user as shown in FIG. 7. Associative interaction detectioncan be disabled for users positioned in regions out of range forperforming associative interactions. For example, a user in the middleof an aisle may be too far to be associated with an object. Additionallyor alternatively, associative interaction detection can be disabled forusers moving in a way that satisfies particular properties (e.g.,walking with a fast pace), oriented relative to nearby objects in aparticular manner (e.g., facing away from products), or satisfying otherconditions.

In another variation, the detection of a user as it relates to identitycan be used to temporarily or permanently disable associativeinteraction detection for that particular user. For example, workers ina store may be identified. A user could be classified as a worker basedon appearance (e.g., wearing a uniform or worker badge). A user couldalternatively be classified as a worker using unique identification ofthe user and then detecting that user identity is a worker. Otheralternative approaches may similarly be used.

Block S130, which includes detecting associative interaction events ofthe objects, functions to functions to determine when objects are partof an associative interaction. An associative interaction eventpreferably involves detecting interaction conditions of two or moreobjects. Interaction conditions preferably characterize the condition inwhich object associations are established. In a basic two-objectinteraction, this will generally involve establishing at least oneassociation between an agent (e.g., a user) and at least a secondobject. Multiple object associations may satisfy an interactioncondition such that the interaction topology is a multi-objectassociation.

A site-installed CV monitoring system will preferably detect theassociative interaction event. An alternative system component mayalternatively be configured to interface with the CV monitoring systemto analyze image data and detect associative interaction events.

The triggering of an action is preferably responsive to the variousproperties of the associative interaction event such as the interactiontopology, object type, modifiers, user/agent associated data, input fromactive objects, and/or other factors and interaction properties. Theassociative interactions preferably map to appropriate actions andevents.

An interaction condition is preferably a CV-derived condition, whichinvolves detecting an object association condition and establishing theobject association between the involved objects. This may occur multipletimes with different objects forming an interaction topology ofdifferent objects. One preferred implementation of an interactioncondition is object proximity or contact. For example, the interactioncondition can define a minimum proximity threshold for a user's handsand objects. When one of the user's hands comes within a certaindistance of an object, then an object association may be established.Some variations may include applying CV-based approaches to detectingsome object interaction such as detecting hand grasping of an object,detecting hand pointing to an object, detecting direct visual attention,detecting user standing on an object, or other suitable directedgestures.

Detecting associative interaction events within an associativeinteraction framework may detect a variety of types of interactions thatvary along different dimensions. The combination of objects, the stateof active objects, the type of context-loaded object, user-associateddata, additional detected user input, and/or other factors may be usedin different ways to determine a resulting action. As discussed herein,various forms of interactions may be detected depending on if the userestablishes associations with one or more objects, with a passive object(e.g., a product), with one or more active objects, with context-loadedobjects, and/or alongside other user inputs.

For an associative interaction of a user with a single object, blockS130 can include, through computer vision analysis of the image data,detecting a first object association of the one user with a first objectS132, and initiating an associative interaction event with a set ofinteraction properties including properties of the user and the firstobject association.

In some variations, the initiation of an associative interaction eventis preferably in response to satisfying an interaction condition. Forexample, some interaction frameworks may involve particular conditionsthat restrict when an associative interaction event is considered. Theinteraction condition can be related to duration (e.g., needing todetect sustained object association some minimum amount of time), anaccompanying detected event (e.g., detecting a second user input like avoice command or CV-detected hand gesture), and/or other factors.

For an associative interaction of a user with a two or more objects,block S130 can include, through computer vision analysis of the imagedata, detecting a first object association of the one user and a firstobject S132; through computer vision analysis of the image data,detecting a second object association of the one user with a secondobject S134; and initiating an associative interaction event with a setof interaction properties including properties of the user, the firstobject association, and the second object association S136. Any suitablenumber of objects may have detected object associations. In general,object associations are established through hand contact. Multipleobjects may have object association established from a single hand.Additionally, the legs of the user or other body parts or techniquescould similarly be used to establish an association. As mentionedherein, associations can be transitive so a chain of objects can all bepart of an interaction topology. In one example, a user may placemultiple items in a basket or cart and then with another hand performsome hand gesture satisfying the interaction condition therebytriggering the associative interaction event. All the products in thebasket could be incorporated into the resulting action. As an example,this could be performed by a user to request the total price of theproducts, total nutritional information (e.g., total caloric count),bulk check all items for certain properties

In detecting an object association, the type of associative object, theidentity or classification of the object, and other object relatedproperties may be determined. The associative object can be a passiveobject, a context-loaded object (i.e., a contextual object), an activeobject, and/or any suitable type of associative object. The identity orclassification relates to an identifier or label for the object such asits SKU identifier, device identifier, type of context-loaded object.

A passive object will generally be static in the environment. Itsidentity is preferably directly related to its properties. In acommerce-based use-case, the first, second, and/or any object can be aproduct (e.g., goods for sale within the environment). For example,products like various cereal varieties, cans of soup, drink bottles,bags of snacks, and/or other grocery items are generally passiveobjects. In the example, where two object associations are associatedwith two products, then the resulting interaction event may initiate anaction that compares the products. Accordingly, executing the action ofblock S140 can include generating a comparison of at least one productattribute of the first object and the second object and communicationpresenting the comparison through an output device in proximity to theuser a shown in FIG. 8.

Context loaded objects as described above have pre-defined meaning inrelationship to an interaction event. For example, their identity or theway a user engages with the context-loaded object can alter theinterpretation of the interaction event. In one example, the method mayinclude detecting a price-check context-loaded object, detecting anutritional information context-loaded object, detecting anaction-trigger context-loaded object (e.g., to initiate a specifiedaction like add to a wishlist), and/or any other suitable type ofcontext-loaded object.

A single context-loaded object may additionally have various subregionsor visually detectable features or mechanisms where the detected natureof interaction with the context-loaded object can modify the associativeinteraction event. For such a variety of object, block S130 may includevisually detecting state of interaction with the context-loaded objectand adding the detected state to the interaction properties, therebymodifying the nature of the interaction event. In one example, this mayinclude visually estimating touch location on the context-loaded object,mapping that location to pre-configured property, and setting thepre-configured property in the interaction properties used to determinethe resulting action.

In detecting an active object, the system may identify a communicationendpoint associated with the active device and initiate communicationwith the active device. This may be performed to request stateinformation or other forms of data from the active device.Alternatively, the active device may have been proactively relayinginformation to the interaction platform coordinating the interactions.In this case, the state information and/or other appropriate data mayalready be accessible such that the appropriate data can be accessedbased on the identity of the device. Accordingly, the method may includereceiving data from an active object, wherein the interaction propertiesincludes the received data as shown in FIG. 9.

In one example with an associative interaction event with two objectassociations, the first object can be an active object. As an activeobject, the active object is preferably in communication with theinteraction platform or other suitable system component managinginteractions. In one variation, the active object is a personalcomputing device wherein the user may enter some information or set someconfiguration. In another variation, the active object may be acomputing device installed or belonging to the environment. The data ofthe personal computing device is relayed to the interaction platform andassociated with the associative interaction event, whereby theinteraction properties are set in part based on data from the activeobject. The resulting action may be altered based on the received data.In a similar example, a second object or more objects can be an activeobject in communication with the interaction platform.

Active objects involved in an associated interaction event mayadditionally or alternatively be involved in an action resulting fromthe associative interaction event. This may be in addition tocommunicating data to an active object (e.g., a connected device). Forexample, an associative interaction event involving a screen and aproduct may result in user-relevant product information being displayedon the display.

In one variation, detecting an associative interaction event canadditionally include detecting a user input, wherein the associativeinteraction properties includes properties of the user input. The userinput is preferably used to modify the interaction event. In oneimplementation, the user input is added an interaction property. Theuser input could alternatively be used in the detection or triggering oran interaction event. For example, an explicit user input may berequired to trigger an associated interaction event.

User input can be detected in a variety of ways. In one variation,detecting the user input can include, through computer vision analysisof the image data, detecting a gesture mapped to a modifier property.The gesture could be a hand gesture, a head or facial gesture, afoot/leg gesture, and/or any suitable type of gesture. Detected userinput may be used in combination with a specific object association. Forexample, a user touching an item with two fingers out may have aspecific interaction event property. In some implementations this couldbe different from a user gesturing two fingers out in one hand andtouching the item with the other hand.

In another variation, detecting the user input can include recordingaudio and detecting a voice command in the audio as the user input asshown in FIG. 10. A microphone of a user device or a site-installeddevice may record the audio. In this way, a user may issue verbal oraudible commands in coordination with establishing object associations.

Different stages of the interaction event may be triggered as individualevents. This can further expand the adaptability of building uniquedigital experiences on the associative interaction framework. Forexample, block S130 may include generating an associative interactionbegan event, generating an associative interaction modifier changeevent, generating an associative interaction end event, and/orgenerating an associative interaction cancel event. For example, adigital experience may initiate some audio or visual signal that anassociative interaction is engaged but not finalized, which can functionto enable the user to modify, commit, or cancel the interaction. Thiscan be used to give the user's more control and allow accidentalinteractions to be appropriately resolved.

Detecting associative interaction events is preferably performed for atleast one user. More preferably, the process of detecting an associativeinteraction is executed simultaneously for at least a subset of usersdetected in the environment. Interaction events can preferably occurindependent of other users and with any suitable overlap of the events.The resulting actions are also preferably isolated to digitalexperiences of the associated user. For example, a first customerperforming an associative interaction will have that event alter thedigital experience of just the first user, while a second customerperforming a second associative interaction will have that event alterthe digital experience of just the second customer. Though the actionscould similarly be used to perform some action in connection to a seconduser.

Block S140, which includes executing an action response based on theassociative interaction event, functions to respond to the triggeredevent. Block S140 can involve altering the state of at least onecomputing device based on the associative interaction properties inassociation with the one user communicatively coupled to the environmentmonitoring system. This computing device may be a component of theinteraction platform, a user computing device, a computing device in theenvironment, or any suitable computing device.

As one preferred type of action response, information may be conveyedvisually, through audio, or in any suitable format. Information deliverycan be used so that users can more easily retrieve information aroundthe items in the environment. Another preferred type of action responseincludes performing some transaction or change in a computing system.This may be used to update a data structure, initiate some secondinteraction (e.g., making an order), or performing any suitable change.Other types of actions may also be configured.

The action response is preferably based on the interaction properties.The action response may be explicitly specified through the propertiesof the interaction event. Alternatively, the interaction platform or asuitable managing system may evaluate the interaction event andappropriately initiate a corresponding response. As described above,various event notifications may be triggered. Different methods couldwatch or attach these events so that they execute during the eventoccurrence. The response may include no action, a single action, ormultiple actions.

As a first variation, executing an action response can includeperforming internal updates within the interaction platform. Executingan action response in this variation may include sending a communicationto a remote server such as to an interaction cloud platform or localcomputing system. This action may be performed transparent to the user.For example, associative interactions with a set of products may be usedin transparently adding the products to a “recently viewed” list for theassociated user.

In a commerce-based environment, an internal system action can be usedto place a digital order through physical interactions in a store. Inone implementation, a user can pre-configure their account with userinteraction configuration that includes a payment mechanism and adefault shipping address. When the user performs an appropriateassociative interaction involving a product, the action can be placingan order in the internal system. More specifically, executing the actionresponse can include automatically placing a delivery order for theproduct, wherein shipping information and payment information isspecified through the user interaction configuration as shown in FIG.11.

As another variation, executing an action response based on theassociative interaction event comprises communicating the actionresponse from the interaction platform to the targeted computing device.In some variations, the targeted device may an active object involved inthe associative interaction event. In other cases, a suitable targetdevice as a destination for carrying out the action may have to beidentified. In this variation, executing the action response can includeselecting a target device, communicating the action response to thetarget device and executing, performing, or otherwise carrying out theaction response at the target device as shown in FIG. 12.

Selecting a target device functions to identify the appropriate devicefor a particular user. In one variation, selecting a target deviceincludes selecting a user device for the associated user. An applicationID or device ID is preferably associated with a user account and used inaddressing the notification. If the user has a registered computingdevice (e.g., a device with the appropriate application installed andlinked to their user account), then that device can be selected as atarget for the action response. As an example, this can be used inupdating an application to reflect the occurrence of the event. Theapplication or device may have logic to process and interpret theassociative interaction. When the personal computing of a user is aphone, block S140 may direct actions on the device such as displayinginformation, playing audio, initiating an alert/notification, executingsome action within the phone, provide tactile feedback, or performingany suitable action. When the personal computing device is a smartwatch, similar actions may be initiated on the device. When the personalcomputing device are connected headphones, block S140 preferably directsactions such as playing audio, initiating tactile feedback, initiatingrecording, or other suitable actions. When the personal computing deviceis a pair of smart glasses, block S140 may direct actions on the devicelike the phone including presenting an augmented or virtual realityrendering (e.g., presenting information) or performing any othersuitable action.

In another variation, selecting a target device can include selecting asite-installed device that satisfies a user feedback proximity conditionas shown in FIG. 8. In one variation, the user feedback proximitycondition is based on distance from the user to the device. Thecondition may additionally evaluate a user's direction of attention,visibility, proximity of other users, and/or other suitable factors. Asite-installed device can include a display, a speaker, a computingkiosk, or other suitable devices. For example, this could be used indisplaying item price check and a virtual cart total on a “price check”display installed in the store. The site-installed device can be anactive device involved in the interaction. For example, destinationdevice could be a display device that the user tapped. Alternatively, itcould be a device detected to be in the vicinity of the user. Thesite-installed device may alternatively not be considered a potentialactive device. For example, a speaker may be out of reach of the userand out of view of the user and therefore not an active object.

Upon selecting a target device a communication is preferably transmittedto the targeted device. The communication may include the interactionproperties, an instruction, or any suitable data. The target device canperform one or more actions based in response to the communication.

Preferably, the communication to the device is used to relay informationto the user. The communication may include a message, an image, or anysuitable media or instruction to present media. The associativeinteraction framework can enable the type of information to becontrolled by the user in a variety of ways. In one variation, a usercan customize their experience to their particular desires. Preferably,a user can set user interaction configuration to select various options.Detecting an associative interaction event may include accessing userinteraction configuration that specifies at least one category of a setof categories of information; and wherein executing the action comprisesoutputting object information corresponding to the user interactionconfiguration. For example, a user may select what type of informationthey want delivered for different types of associative interactionevents. One user may want to receive pricing information while anotheruser may configure their account to receive nutritional information.

As shown in the exemplary scenario of FIG. 2A, a single objectassociative interaction may involve a user touching a product. Forgeneral objects such as products in a store this form of associativeinteraction may add the touched product to a “product history” for theuser. In another variation, establishing an object association with asingle product may add the product to a list like a wishlist, an “orderlater” list, or categorize the product in any suitable manner. Otheractions could similarly be used. In more specialized objects such asingle object associative interaction may trigger an object specificevent. For example, touching a context-loaded object that signifies“service request” may trigger a notification to a worker to assist acustomer.

As shown in the exemplary scenario of FIG. 2B, a multi-objectassociative interaction may involve a user touching two productssimultaneously. In one implementation this may trigger productcomparison through a connected interface such as a connectedspeaker/headphones or a display as described above. For example, inholding up two products a product comparison could be presented as, “Theproduct in your right hand is the better purchase by price but theproduct in your left hand is more highly recommended and has lowersodium content”. FIG. 2b , shows the associative interaction as touchingwith each hand but may additionally include touching two objects in onehand, using a user's feet/legs in touching an object, holding an objectin a user associated object (e.g., a basket/bag). Multi-objectassociative interactions can similarly be modified through userinteraction modifiers, contextually loaded objects, user associateddata, and/or other factors. As shown in the exemplary scenario of FIG.2C, an associative interaction may involve a user with preconfiguredpreference for nutritional information. When the user touches twoproducts, the product comparison may default to providing a nutritionalcomparison.

Contextually loaded objects may also be used in other forms ofassociative interactions. As shown in the exemplary scenario of FIG. 3B,associative interaction may involve a user touching one product and asecond contextually loaded object like a pricing request decal. In thisexample, a price check can be executed for the product and presentedthrough a connected interface. In another variation, the contextuallyloaded object could be an “order” decal, and this may be used to add theproduct to an order such as a home delivery order. This may function toenable a customer to shop “on-line” from within a store.

In some cases, user data may be used in combination with contextuallyloaded objects. As shown in the exemplary scenario of FIG. 3C, anassociative interaction may involve a user with preconfigurednutritional information (e.g., allergies, food restrictions, nutritionalgoals, etc.) and an associated virtual cart. When the user touches aproduct and a nutrition decal, this could trigger the presentation ofnutrition facts relevant to the user and/or nutrition facts in thecontext of the cart. For example, a connected display or audio interfacecould inform the user how that product impacts the cumulativenutritional value of their cart.

In another variation, contextually loaded objects may be used incombination to perform more complex forms of user input. As shown in theexemplary scenario of FIG. 3D, an associative interaction may involvedetecting a user touching two contextually loaded objects: a controlaction decal and an application decal. This can trigger the relevantcontrol action on the specified application.

As shown in the exemplary scenario of FIG. 5, an associative interactionmay involve a user performing some modifying gesture when touching aproduct. For example, the user could selectively hold out one, two, orthree fingers to trigger different actions. In one implementation, nogestures will default to providing a price check, one finger willtrigger a nutritional check, two fingers can trigger similar productrecommendations, and three fingers can add the item to a wishlist.

Associative interactions could additionally result in particular eventswhen used with active objects. As shown in the exemplary scenario ofFIG. 4B, an associative interaction may involve a user touching aconnected store display. This may trigger the display of user-associatedinformation like the user's current cart contents, profile information,and/or other information. In a multi-object variation as shown in FIG.4A, a user may touch a display and a product, which can trigger thedisplay of product related information.

As one particular example shown in FIG. 4C, an associative interactionmay involve a user touching two different active objects. This maytrigger some interaction between the two active objects. For example,this may result in the synchronization or communication of data betweenthe two devices.

The systems and methods of the embodiments can be embodied and/orimplemented at least in part as a machine configured to receive acomputer-readable medium storing computer-readable instructions. Theinstructions can be executed by computer-executable componentsintegrated with the application, applet, host, server, network, website,communication service, communication interface,hardware/firmware/software elements of a user computer or mobile device,wristband, smartphone, or any suitable combination thereof. Othersystems and methods of the embodiment can be embodied and/or implementedat least in part as a machine configured to receive a computer-readablemedium storing computer-readable instructions. The instructions can beexecuted by computer-executable components integrated with apparatusesand networks of the type described above. The computer-readable mediumcan be stored on any suitable computer readable media such as RAMs,ROMs, flash memory, EEPROMs, optical devices (CD or DVD), hard drives,floppy drives, or any suitable device. The computer-executable componentcan be a processor but any suitable dedicated hardware device can(alternatively or additionally) execute the instructions.

As a person skilled in the art will recognize from the previous detaileddescription and from the figures and claims, modifications and changescan be made to the embodiments of the invention without departing fromthe scope of this invention as defined in the following claims.

We claim:
 1. A method for an interaction framework used for user inputwithin an interaction platform used in an environment, the methodcomprising: collecting image data in the environment; through computervision analysis of the image data, classifying objects in theenvironment wherein a plurality of the objects are detected users; forat least one user, detecting an associative interaction event, whichcomprises: through computer vision analysis of the image data, detectinga first object association of the one user with a first object, andinitiating the associative interaction event with a set of interactionproperties including properties of the user and the first objectassociation; and executing an action response based on the associativeinteraction event.
 2. The method of claim 1, wherein collecting imagedata comprises collecting image data from a plurality of imaging devicesthat are distributed across the environment, wherein at least a subsetof the image data is collected from imaging devices with an aerialperspective.
 3. The method of claim 2, wherein the process of detectingan associative interaction events is executed simultaneously for atleast a subset of users detected in the environment.
 4. The method ofclaim 1, wherein detecting an associative interaction event for the atleast one user comprises, through computer vision analysis of the imagedata, detecting a second object association of the one user with asecond object; and wherein the set of interaction properties furtherincludes properties of the second object.
 5. The method of claim 4,wherein the first object is an active object that is in communicationwith the interaction platform.
 6. The method of claim 5, furthercomprising wherein the set of interaction properties are set in partbased on data of the active object that is communicated to theinteraction platform.
 7. The method of claim 5, wherein executing anaction response based on the associative interaction event comprisescommunicating the action response from the interaction platform to theactive device.
 8. The method of claim 4, wherein the first object andthe second object are goods for sale within the environment.
 9. Themethod of claim 8, wherein executing the action response comprisesgenerating a comparison of at least one product attribute of the firstobject and the second object and presenting the comparison through anoutput device in proximity to the user.
 10. The method of claim 1,wherein the first object is a context-loaded object.
 11. The method ofclaim 10, wherein detecting the first object association furthercomprises visually detecting state of interaction with thecontext-loaded object and adding the detected state to the set ofinteraction properties.
 12. The method of claim 1, wherein detecting anassociative interaction event further comprises detecting a user input,wherein the associative interaction properties includes properties ofthe user input.
 13. The method of claim 12 wherein detecting the userinput comprises, through computer vision analysis of the image data,detecting a hand gesture mapped to a modifier property.
 14. The methodof claim 12, wherein detecting the user input comprises recording audioand detecting a voice command in the audio as the user input.
 15. Themethod of claim 1, wherein detecting an associative interaction eventfurther comprises accessing user interaction configuration thatspecifies at least one category of a set of categories of information;and wherein executing the action comprises outputting object informationcorresponding to the user interaction configuration.
 16. The method ofclaim 1, wherein the first object is a product; and wherein executingthe action response comprises automatically placing a delivery order forthe product, wherein shipping information and payment information isspecified through user interaction configuration associated with theuser.
 17. The method of claim 1, wherein executing the action responsefurther comprises performing internal updates of a computing devicewithin the interaction platform.
 18. The method of claim 1, whereinexecuting the action response further comprises selecting a targetdevice, communicating the action response to the target device, andexecuting the action response at the target device.
 19. The method ofclaim 18, wherein selecting a target device comprises selecting a userdevice associated with the user.
 20. The method of claim 1, whereinexecuting the action response further comprises selecting asite-installed device satisfying a user feedback proximity condition andcommunicating with the site-installed device.
 21. A method for aninteraction framework used for user input in a store environmentcomprising: collecting image data from a plurality of imaging devicesthat are distributed across an environment; through computer visionanalysis of the image data, tracking a plurality of users; classifyingobjects in the store environment wherein at least a subset of theobjects are products for sale in the environment; for at least one user,detecting an associative interaction event, which comprises: throughcomputer vision analysis of the image data, detecting a first objectassociation of the one user and a first product, through computer visionanalysis of the image data, detecting second object association of theone user and a second object, wherein the properties of the user, firstobject association, and the second object association form a set ofinteraction properties, and initiating an associative interaction eventwith a set of interaction properties; and altering the state of at leastone computing device based on the set of interaction properties inassociation with the one user.
 22. A system for an interaction frameworkused for user input within an interaction platform used in anenvironment comprising: a computer vision monitoring system thatincludes a plurality of imaging devices distributed at distinctlocations across an environment, wherein at least a subset of imagingdevices are mounted in an aerial location; wherein the computer visionmonitoring system is further configured to: collect image data, classifyobjects in the environment which includes detection of a set of usersthat are present in the environment, for at least one user, detect afirst object association between the user and a first classified object,detect a second object association between the user and a secondclassified object, and initiate an associative interaction event withproperties of the first classified object, the second classified object,and the user; and an interaction platform configured to initiateexecution of a response to the associative interaction event with atleast one computing device.